import pandas as pd
import matplotlib.pyplot as plt
import bambi as bmb
import seaborn as sns
from pygam.datasets import mcycle
from pygam import LinearGAM, s, f, GAM, l, utils
import numpy as np
import arviz as az
import pymc as pm
random_seed = 100
import warnings
warnings.simplefilter('ignore')GAMs and GPs: Flexibility and Calibration
Flexible Spline models risk overfit and need to be carefully calibrated against real data.
In this blog post we’re going to dive into modelling of non-linear functions and explore some of the tooling available in the python eco-system. We’ll start by looking into Generalised Additive Models with splines in pyGAM before preceding to look Bayesian versions of spline modelling comparing the splines to Gaussian processes in Bambi and PyMC. Before showing how hierarchical bayesian models avoid some of the issues of overfit in simpler spline models.
Our interest in these models stems from their flexibility to approximate functions of arbitrary complexity. We’ll see how the methods work in the case of relatively straightforward toy example and then we’ll apply each of the methods to deriving insights into the functional form of insurance loss curves. In this application we adapt a data set discussed in Mick Cooney’s Stan case study to demonstrate the power of hierarchical spline models. Throughout we’ll draw on the discussion of these methods in Osvaldo Martin’s “Bayesian Analysis with Python” for practical details implementing these models.
All of these methods need to be assessed with respect to their in-sample model fit and their out of sample performance. How can we best calibrate the model fits to perform reasonably well out of sample?
Generalised Additive models
The canonical reference for GAMs is Simon Wood’s “Generalised Additive Models: An Introduction with R” which outlines in some detail the theoretical background of splines and univariate smoothers. The book stresses the trade-offs between the flexibility of splines and the need for cross-validation and penalised estimation methods for spline based modelling.
In R these penalised models fits can be achieved in mgcv which incorporates a wilkinson like formula syntax for model specification: y ~ s(x) + s(x1). The closest implementation in python is available in PyGam and we will adopt this package to illustrate the application of smoothing terms and the penalities.
PyGAM and Penalised Fits
In the next code block we load in an example data set on which to demonstrate univariate smoothing patterns using penalised splines. These models can be optimised by fitting differing strength penalities over a varying number of splines
X, y = mcycle(return_X_y=True)
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(X, y)
ax.set_ylabel("Acceleration")
ax.set_xlabel("Time Step")
ax.set_title("Crash Test Dummy Acceleration \n Simulated Motorcycle Crash", fontsize=20)Text(0.5, 1.0, 'Crash Test Dummy Acceleration \n Simulated Motorcycle Crash')
Next we fit a number of different models to account for the herky-jerky of the data generating processs. We vary the parameterisations to see how the numbers of splines and strength of the penalty help account for the variation in \(y\) over the support of \(X\).
gam1 = LinearGAM(s(0, n_splines=5)).fit(X, y)
gam2 = LinearGAM(s(0, n_splines=7)).fit(X, y)
gam3 = LinearGAM(s(0, n_splines=10)).fit(X, y)
gam4 = LinearGAM(s(0, n_splines=15)).fit(X, y)
gam5 = LinearGAM(s(0, lam=.1)).fit(X, y)
gam6 = LinearGAM(s(0, lam=.5)).fit(X, y)
gam7 = LinearGAM(s(0, lam=5)).fit(X, y)
gam8 = LinearGAM(s(0, lam=15)).fit(X, y)
def plot_fit(gam, X, y, ax, t, c1='b', c2='r'):
XX = gam.generate_X_grid(term=0, n=500)
ax.plot(XX, gam.predict(XX), color=c2, linestyle='--')
ax.plot(XX, gam.prediction_intervals(XX, width=.95), color=c1, ls='--')
ax.scatter(X, y, facecolor='gray', edgecolors='none')
ax.set_title(f"""95% prediction interval with {t} \n LL: {gam.statistics_['loglikelihood']}""");
fig, axs = plt.subplots(4,2, figsize=(10, 20))
axs = axs.flatten()
titles = ['5_splines', '7_splines', '10_splines', '15_splines',
'lam=.1', 'lam=.5', 'lam=5', 'lam=15']
gs = [gam1, gam2, gam3, gam4, gam5, gam6, gam7, gam8]
for ax, g, t in zip(axs, gs, titles):
plot_fit(g, X, y, ax, t)Here we’ve seen the PyGAM package applied to fitting our model to the data. In the formula specification we see y ~ s(i) where i denotes the index of the column variable in the X data.
Over the range of of the x-axis we can see how the conditional expectation is more or less well fit to the data depending on how the penalities and complexity of the model is specified.
Optimising The Parameter Setting
We can see from the model summary what is going on under the hood. For a given model specification the summary will report a number of model-fit statistics such as the log-likelihood and the AIC.
## Naive Model manually specified splines
gam_raw = LinearGAM(s(0)).fit(X, y)
print("log_likelihood:", gam_raw.statistics_['loglikelihood'])
print("AIC:", gam_raw.statistics_['AIC'])log_likelihood: -954.5558010528869
AIC: 1931.5965064139743
The question then becomes what changes are induced in the model as we seek to optimise these model fit statistics.
## model optimised
gam = LinearGAM(s(0), fit_intercept=False)
gam.gridsearch(X, y)
gam.summary() 0% (0 of 11) | | Elapsed Time: 0:00:00 ETA: --:--:--
100% (11 of 11) |########################| Elapsed Time: 0:00:00 Time: 0:00:00
LinearGAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 11.8135
Link Function: IdentityLink Log Likelihood: -952.2409
Number of Samples: 133 AIC: 1930.1088
AICc: 1933.0789
GCV: 609.3811
Scale: 512.7965
Pseudo R-Squared: 0.7984
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.2512] 20 11.8 1.11e-16 ***
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
Fortunately, this routine can be performed directly and results in the following differences between the naive and optimised model.
Plot the GAM fits
fig, ax = plt.subplots(figsize=(10, 6))
plot_fit(gam_raw, X, y, ax, "Unoptimised Fit", c1='orange', c2='green')
plot_fit(gam, X, y, ax, "Optimised Fit")This is all well and good. We’ve seen an approach to modelling that can capture eccentric patterns in raw data. But how does it work and why should we care?
If you’re familiar with the nomenclature of machine learning, you should think of spline modelling as a variety of feature creation. It generates “synthetic” features over the range of the observed variable. These synthetic features are the splines in question.
Digression on the usage of “Splines”
The history of the term “spline” is related to the history of draftmanship. Historically splines were thin strips of flexible wood or plastic that could be bent or shaped around a weight or “knot” points to express a traceable curve over the space of a “numberline”. The elastic nature of the spline material allowed it to be bent around the knot points of curvature expressing a smooth or continuous bend.
The mathematical technique apes these properties by defining a curve over in an analogous way. We specify “knots” to carve up the support of the random variable \(X\) into portions that require different weighting schemes to represent the outcome \(y\)
Extracting the Splines
We can extract the spline features used in the PyGAM by invoking the following commands to first identify the knot points and create the b-spline basis appropriate for the variable \(X\).
knot_edges=utils.gen_edge_knots(X,dtype='numerical')
knots=np.linspace(knot_edges[0],knot_edges[-1],len(gam.coef_))
splines = utils.b_spline_basis(X, edge_knots=knot_edges, sparse=False)
splines_df = pd.DataFrame(splines, columns=[f'basis_{i}' for i in range(len(gam.coef_))])
splines_df.head(10)| basis_0 | basis_1 | basis_2 | basis_3 | basis_4 | basis_5 | basis_6 | basis_7 | basis_8 | basis_9 | basis_10 | basis_11 | basis_12 | basis_13 | basis_14 | basis_15 | basis_16 | basis_17 | basis_18 | basis_19 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.166667 | 0.666667 | 0.166667 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 1 | 0.132997 | 0.661606 | 0.205334 | 0.000063 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2 | 0.059688 | 0.594827 | 0.341426 | 0.004059 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 3 | 0.030095 | 0.518726 | 0.437481 | 0.013698 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 4 | 0.012374 | 0.428013 | 0.527144 | 0.032470 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 5 | 0.000000 | 0.040337 | 0.551431 | 0.399315 | 0.008917 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 6 | 0.000000 | 0.018232 | 0.465467 | 0.492630 | 0.023671 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 7 | 0.000000 | 0.011137 | 0.418489 | 0.535407 | 0.034967 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 8 | 0.000000 | 0.000014 | 0.189310 | 0.664817 | 0.145859 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 9 | 0.000000 | 0.000000 | 0.120914 | 0.656897 | 0.222015 | 0.000174 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
These spline features range the extent of the of covariate space \(X\) defining “partitions” of the space. The model “learns” to capture the shape of the outcome variable \(y\) by figuring out how to weight the different portion of this spline basis matrix i.e. the linear combination of this basis matrix with the derived coefficients is a model of our outcome variable.
Note how each row is 0 everywhere except within the columns that represent a partition of \(X\). Additionally each row sum to unity. These properties are important because they ensure that any weighted combination of this basis represents the outcome variable in a controlled and quite granular manner. The more splines we use the more control we have of the representation.
Plotting the Weighted Spline
ax = splines_df.dot(gam.coef_).plot(title='Weighted Splines', label='Weighted Combination of Spline Basis', figsize=(10, 6))
ax.set_ylabel("Acceleration")
ax.set_xlabel("Time Steps")
ax.legend();In this manner we can see how the specification of a spline basis can help us model eccentric curves in an outcome space.
Next we’ll see how to more directly work with the specification of basis splines, passing these feature matrices into Bambi models.
Bayesian Splines with bambi
Under the hood Bambi makes use of the patsy package formula syntax to specify spline basis terms.
knots_6 = np.linspace(0, np.max(X), 6+2)[1:-1]
knots_10 = np.linspace(0, np.max(X), 10+2)[1:-1]
knots_15 = np.linspace(0, np.max(X), 15+2)[1:-1]
df = pd.DataFrame({'X': X.flatten(), 'y': y})
formula1 = 'bs(X, degree=0, knots=knots_6)'
formula2 = 'bs(X, degree=1, knots=knots_6, intercept=True)'
formula3 = 'bs(X, degree=3, knots=knots_6, intercept=True)'
formula4 = 'bs(X, degree=3, knots=knots_10, intercept=True)'
formula5 = 'bs(X, degree=3, knots=knots_15, intercept=True)'
model_spline1 = bmb.Model(f"y ~ {formula1}", df)
model_spline2 = bmb.Model(f"y ~ {formula2}", df)
model_spline3 = bmb.Model(f"y ~ {formula3}", df)
model_spline4 = bmb.Model(f"y ~ {formula4}", df)
model_spline5 = bmb.Model(f"y ~ {formula5}", df)
model_spline5 Formula: y ~ bs(X, degree=3, knots=knots_15, intercept=True)
Family: gaussian
Link: mu = identity
Observations: 133
Priors:
target = mu
Common-level effects
Intercept ~ Normal(mu: -25.5459, sigma: 226.5005)
bs(X, degree=3, knots=knots_15, intercept=True) ~ Normal(mu: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0.], sigma: [1209.5913 1309.9059 1516.6254 914.7421 803.2567
526.1027 542.4563
739.4915 590.0538 692.802 786.9096 707.6947 852.6529 792.6311
1046.3175 1246.4855 1367.9865 1595.1739 1391.2518])
Auxiliary parameters
sigma ~ HalfStudentT(nu: 4.0, sigma: 48.14)
As you can see here we are specifying a range of different degrees of spline basis. The different degrees corresspond to the smoothness of the overlapping splines. The degree=0 splines mean we specify a piecewise constant basis i.e. 0 or 1 within each region of the partition. But we can add more degrees to see more flexible representations of the space.
model_spline5.build()
model_spline5.graph()This will become clearer as we plot the various spline basis matrices below.
Plot the Spline Basis
The below functions extract the basis specification from each model and plots the basis design for an increasingly complex series of spline basis matrices.
def plot_spline_basis(basis, X, ax, title="Spline Basis"):
df = (
pd.DataFrame(basis)
.assign(X=X)
.melt("X", var_name="basis_idx", value_name="y")
)
for idx in df.basis_idx.unique():
d = df[df.basis_idx == idx]
ax.plot(d["X"], d["y"])
ax.set_title(title)
return ax
def plot_knots(knots, ax):
for knot in knots:
ax.axvline(knot, color="0.1", alpha=0.4)
return ax
fig, axs = plt.subplots(5, 1, figsize=(9, 20))
axs = axs.flatten()
axs.flatten()
B1 = model_spline1.response_component.design.common[formula1]
plot_spline_basis(B1, df["X"].values, ax=axs[0], title="Piecewise Constant Basis")
plot_knots(knots_6, axs[0]);
B2 = model_spline2.response_component.design.common[formula2]
ax = plot_spline_basis(B2, df["X"].values, axs[1],
title="Piecewise Linear Basis")
plot_knots(knots_6, axs[1]);
B3 = model_spline3.response_component.design.common[formula3]
ax = plot_spline_basis(B3, df["X"].values, axs[2],
title="Cubic Spline Basis (6 Knots)")
plot_knots(knots_6, axs[2]);
B4 = model_spline4.response_component.design.common[formula4]
ax = plot_spline_basis(B4, df["X"].values, axs[3],
title="Cubic Spline Basis (10 Knots)")
plot_knots(knots_10, axs[3]);
B5 = model_spline5.response_component.design.common[formula5]
ax = plot_spline_basis(B5, df["X"].values, axs[4],
title="Cubic Spline Basis (15 Knots)")
plot_knots(knots_15, axs[4]);Now that we’ve seen the nature of the modelling splines we’ll fit each model to the data and plot how the weighted splines matrices are able to represent the raw data.
Fit the Individual Spline Models
idata_spline1 = model_spline1.fit(random_seed=random_seed, idata_kwargs={"log_likelihood": True})
idata_spline2 = model_spline2.fit(random_seed=random_seed, idata_kwargs={"log_likelihood": True})
idata_spline3 = model_spline3.fit(random_seed=random_seed, idata_kwargs={"log_likelihood": True})
idata_spline4 = model_spline4.fit(random_seed=random_seed, idata_kwargs={"log_likelihood": True})
idata_spline5 = model_spline5.fit(random_seed=random_seed, idata_kwargs={"log_likelihood": True})Plotting the fits
Plot the Weighted Mean
def plot_weighted_splines(B, idata, formula, ax, knots):
posterior_stacked = az.extract(idata)
wp = posterior_stacked[formula].mean("sample").values
plot_spline_basis(B * wp.T, df["X"].values, ax)
ax.plot(df.X.values, np.dot(B, wp.T), color="black", lw=3, label='Weighted Splines')
plot_knots(knots, ax);
ax.legend()
fig, axs = plt.subplots(5, 1, figsize=(10, 20))
axs = axs.flatten()
axs.flatten()
plot_weighted_splines(B1, idata_spline1, formula1, axs[0], knots_6)
plot_weighted_splines(B2, idata_spline2, formula2, axs[1], knots_6)
plot_weighted_splines(B3, idata_spline3, formula3, axs[2], knots_6)
plot_weighted_splines(B4, idata_spline4, formula4, axs[3], knots_10)
plot_weighted_splines(B5, idata_spline5, formula5, axs[4], knots_15)Here we can see how the models with increasingly complex splines are more exactly able to fit the herky jerky trajectory of the outcome variable in each interval. The fewer the intervals, the less flexibility available to the model.
Compare Model Fits
As before we can evaluate these model fits and compare them based on leave-one-out cross validation scores and information theoretic complexity measures.
fig, axs = plt.subplots(1, 2, figsize=(10, 7))
axs = axs.flatten()
models_dict = {"piecewise_constant": idata_spline1, "piecewise_linear": idata_spline2, "cubic_bspline": idata_spline3, "cubic_bspline_10": idata_spline4,
"cubic_bspline_15": idata_spline5}
df_compare = az.compare(models_dict)
az.plot_compare(df_compare, ax=axs[0])
az.plot_compare(az.compare(models_dict, 'waic'), ax=axs[1])
axs[1].set_yticklabels([])
df_compare| rank | elpd_loo | p_loo | elpd_diff | weight | se | dse | warning | scale | |
|---|---|---|---|---|---|---|---|---|---|
| cubic_bspline_10 | 0 | -612.274608 | 11.462200 | 0.000000 | 9.030402e-01 | 9.689854 | 0.000000 | True | log |
| cubic_bspline_15 | 1 | -616.252712 | 15.566839 | 3.978104 | 2.408230e-14 | 9.833120 | 1.374420 | True | log |
| cubic_bspline | 2 | -634.729956 | 8.698337 | 22.455348 | 5.894750e-03 | 8.897139 | 6.637243 | True | log |
| piecewise_constant | 3 | -643.781042 | 6.981098 | 31.506433 | 5.164397e-02 | 9.770740 | 8.719803 | False | log |
| piecewise_linear | 4 | -647.267410 | 6.170543 | 34.992802 | 3.942108e-02 | 7.902853 | 8.094122 | False | log |
Here we see that the extra complexity of using 15 splines leads to slightly worse performance measures than the less complex but seemingly adequate 10 splines.
new_data = pd.DataFrame({"X": np.linspace(df.X.min(), df.X.max(), num=500)})
model_spline4.predict(idata_spline4, data=new_data,
kind='pps', inplace=True)
idata_spline4-
<xarray.Dataset> Dimensions: (chain: 4, draw: 1000, bs(X, degree=3, knots=knots_10, intercept=True)_dim: 14, y_obs: 500) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 ... 999 * bs(X, degree=3, knots=knots_10, intercept=True)_dim (bs(X, degree=3, knots=knots_10, intercept=True)_dim) int64 ... * y_obs (y_obs) int64 0 ... 499 Data variables: Intercept (chain, draw) float64 ... bs(X, degree=3, knots=knots_10, intercept=True) (chain, draw, bs(X, degree=3, knots=knots_10, intercept=True)_dim) float64 ... y_sigma (chain, draw) float64 ... y_mean (chain, draw, y_obs) float64 ... Attributes: created_at: 2024-04-07T23:02:52.053793 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3 sampling_time: 7.797940015792847 tuning_steps: 1000 modeling_interface: bambi modeling_interface_version: 0.13.0 -
<xarray.Dataset> Dimensions: (chain: 4, draw: 1000, y_obs: 500) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 992 993 994 995 996 997 998 999 * y_obs (y_obs) int64 0 1 2 3 4 5 6 7 8 ... 492 493 494 495 496 497 498 499 Data variables: y (chain, draw, y_obs) float64 19.24 3.06 8.491 ... 10.07 49.53 -17.6 Attributes: modeling_interface: bambi modeling_interface_version: 0.13.0 -
<xarray.Dataset> Dimensions: (chain: 4, draw: 1000, y_obs: 133) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 992 993 994 995 996 997 998 999 * y_obs (y_obs) int64 0 1 2 3 4 5 6 7 8 ... 125 126 127 128 129 130 131 132 Data variables: y (chain, draw, y_obs) float64 -4.14 -4.143 -4.178 ... -4.008 -4.024 Attributes: created_at: 2024-04-07T23:02:52.162726 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3 modeling_interface: bambi modeling_interface_version: 0.13.0 -
<xarray.Dataset> Dimensions: (chain: 4, draw: 1000) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 ... 994 995 996 997 998 999 Data variables: (12/17) perf_counter_diff (chain, draw) float64 0.00246 0.005695 ... 0.008435 process_time_diff (chain, draw) float64 0.002448 0.00566 ... 0.008436 reached_max_treedepth (chain, draw) bool False False False ... False False largest_eigval (chain, draw) float64 nan nan nan nan ... nan nan nan energy_error (chain, draw) float64 -1.295 0.4866 ... 2.466 -2.027 lp (chain, draw) float64 -719.5 -721.5 ... -722.1 -722.2 ... ... step_size_bar (chain, draw) float64 0.04864 0.04864 ... 0.05827 tree_depth (chain, draw) int64 6 7 6 4 6 6 7 5 ... 5 5 3 4 5 7 8 max_energy_error (chain, draw) float64 -1.295 1.177 ... 4.799 -2.488 step_size (chain, draw) float64 0.03411 0.03411 ... 0.04662 perf_counter_start (chain, draw) float64 1.572e+06 ... 1.572e+06 energy (chain, draw) float64 728.2 728.6 ... 729.9 729.2 Attributes: created_at: 2024-04-07T23:02:52.061738 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3 sampling_time: 7.797940015792847 tuning_steps: 1000 modeling_interface: bambi modeling_interface_version: 0.13.0 -
<xarray.Dataset> Dimensions: (y_obs: 133) Coordinates: * y_obs (y_obs) int64 0 1 2 3 4 5 6 7 8 ... 125 126 127 128 129 130 131 132 Data variables: y (y_obs) float64 0.0 -1.3 -2.7 0.0 -2.7 ... -2.7 10.7 -2.7 10.7 Attributes: created_at: 2024-04-07T23:02:52.064251 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3 modeling_interface: bambi modeling_interface_version: 0.13.0
Next we plot the posterior predictive distribution of our observed variable and compare against the observed data. Additionally we plot the 89th and 50% HDI.
ax = az.plot_hdi(new_data['X'], idata_spline4['posterior_predictive']['y'], fill_kwargs={'alpha': 0.2, 'color':'firebrick'}, hdi_prob=0.89, figsize=(10, 8))
az.plot_hdi(new_data['X'], idata_spline4['posterior_predictive']['y'], fill_kwargs={'alpha': 0.8, 'color':'firebrick'}, hdi_prob=0.5)
y_mean = idata_spline4['posterior_predictive']['y'].mean(dim=('chain', 'draw'))
ax.plot(new_data['X'], y_mean, label='Expected posterior predictive', color='k')
ax.set_xlabel("Time Point")
ax.set_ylabel("Acceleration")
ax.scatter(df['X'], df['y'], label='Observed Datapoints')
ax.legend()
ax.set_title("Posterior Predictive Distribution \n Based on 10 Knots");This represents a good a clean model fit to the observed data using univariate spline smoothers. Next we’ll see another alternative approach to model this outcome variable using approximate gaussian processes.
Gaussian processes
The topic of gaussian processes is rich and detailed. Too rich to be fairly covered in this blog post, so we’ll just say that we’re using a method designed for function approximation that makes use of drawing samples from a multivariate normal distribution under a range of different covariance relationships.
These relationships can be somewhat intuitively interrogated by defining different combinations of covariance relationships with priors over the parameters governing the covariance of a sequence of points. For example consider the following parameterisations.
lengthscale = 3
sigma = 13
cov = sigma**2 * pm.gp.cov.ExpQuad(1, lengthscale)
X = np.linspace(0, 60, 200)[:, None]
K = cov(X).eval()
fig, ax = plt.subplots(figsize=(9, 7))
ax.plot(
X,
pm.draw(
pm.MvNormal.dist(mu=np.zeros(len(K)), cov=K, shape=K.shape[0]), draws=10, random_seed=random_seed
).T,
)
plt.title(f"Samples from the GP prior \n lengthscale: {3}, sigma: {13}")
plt.ylabel("y")
plt.xlabel("X");We’ve specified the range of X to reflect the support of the acceleration example and allowed the draws to be informed by a covariance function we have parameterised using the Exponentiated Quadratic kernel:
\[k(x, x') = \mathrm{exp}\left[ -\frac{(x - x')^2}{2 \ell^2} \right]\]
The patterns exhibited show a good range of “wiggliness” that they should be flexible enough to capture the shape of the acceleration, if we can calibrate the posterior of parameters against the observed data.
Priors on Gaussian Priors
Consider the following specification for the priors
fig, axs = plt.subplots(1, 2, figsize=(9, 6))
axs = axs.flatten()
axs[0].hist(pm.draw(pm.InverseGamma.dist(mu=1, sigma=1), 1000), ec='black', bins=30);
axs[0].set_title("Priors for Lengthscale \n in ExpQuad Kernel")
axs[1].hist(pm.draw(pm.Exponential.dist(lam=1), 1000), ec='black', bins=30);
axs[1].set_title("Priors for Amplitude \n in ExpQuad Kernel")Text(0.5, 1.0, 'Priors for Amplitude \n in ExpQuad Kernel')
We use these to specify priors on the Hilbert space approximation of gaussian priors available in the Bambi package.
prior_hsgp = {
"sigma": bmb.Prior("Exponential", lam=1), # amplitude
"ell": bmb.Prior("InverseGamma", mu=1, sigma=1) # lengthscale
}
# This is the dictionary we pass to Bambi
priors = {
"hsgp(X, m=10, c=1)": prior_hsgp,
"sigma": bmb.Prior("HalfNormal", sigma=4)
}
model_hsgp = bmb.Model("y ~ 0 + hsgp(X, m=10, c=1)", df, priors=priors)
model_hsgp Formula: y ~ 0 + hsgp(X, m=10, c=1)
Family: gaussian
Link: mu = identity
Observations: 133
Priors:
target = mu
HSGP contributions
hsgp(X, m=10, c=1)
cov: ExpQuad
sigma ~ Exponential(lam: 1.0)
ell ~ InverseGamma(mu: 1.0, sigma: 1.0)
Auxiliary parameters
sigma ~ HalfNormal(sigma: 4.0)
Here we’ve set the m=10 to determine the number of basis vectors used in the Hilbert space approximation. The idea differs in detail from the spline based approximations we’ve seen, but it’s perhaps useful to think of the process in the same vein.
idata = model_hsgp.fit(inference_method="nuts_numpyro",target_accept=0.95, random_seed=121195,
idata_kwargs={"log_likelihood": True})
print(idata.sample_stats["diverging"].sum().to_numpy())Compiling...
Compilation time = 0:00:00.902571
Sampling...
Sampling time = 0:00:01.637607
Transforming variables...
Transformation time = 0:00:00.113155
Computing Log Likelihood...
Log Likelihood time = 0:00:00.140654
0
This model fits and the sampling seems to have worked well.
az.plot_trace(idata, backend_kwargs={"layout": "constrained"}, figsize=(9, 15));The lengthscale and sigma parameters we have learned by calibrating our priors against the data. The degree to which these parameters are meaningful depend a little on how familar you are with covariance matrix kernels and their properties, so we won’t dwell on the point here.
az.summary(idata, var_names=['hsgp(X, m=10, c=1)_ell', 'hsgp(X, m=10, c=1)_sigma', 'y_sigma', 'hsgp(X, m=10, c=1)_weights'])| mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
|---|---|---|---|---|---|---|---|---|---|
| hsgp(X, m=10, c=1)_ell | 3.301 | 0.666 | 2.099 | 4.567 | 0.016 | 0.011 | 1763.0 | 1815.0 | 1.0 |
| hsgp(X, m=10, c=1)_sigma | 23.675 | 2.679 | 18.925 | 28.847 | 0.069 | 0.049 | 1497.0 | 2167.0 | 1.0 |
| y_sigma | 20.589 | 1.121 | 18.590 | 22.751 | 0.019 | 0.014 | 3440.0 | 2485.0 | 1.0 |
| hsgp(X, m=10, c=1)_weights[0] | -131.588 | 13.784 | -157.019 | -105.189 | 0.216 | 0.154 | 4071.0 | 3020.0 | 1.0 |
| hsgp(X, m=10, c=1)_weights[1] | -94.391 | 18.010 | -125.921 | -58.020 | 0.363 | 0.256 | 2500.0 | 2876.0 | 1.0 |
| hsgp(X, m=10, c=1)_weights[2] | 106.121 | 20.717 | 69.627 | 147.247 | 0.458 | 0.324 | 2046.0 | 2563.0 | 1.0 |
| hsgp(X, m=10, c=1)_weights[3] | 149.289 | 22.154 | 106.508 | 188.107 | 0.521 | 0.368 | 1807.0 | 2467.0 | 1.0 |
| hsgp(X, m=10, c=1)_weights[4] | -83.399 | 22.728 | -124.812 | -40.406 | 0.539 | 0.383 | 1776.0 | 2517.0 | 1.0 |
| hsgp(X, m=10, c=1)_weights[5] | -125.322 | 23.273 | -169.346 | -82.187 | 0.531 | 0.377 | 1917.0 | 2940.0 | 1.0 |
| hsgp(X, m=10, c=1)_weights[6] | 37.835 | 21.153 | -1.712 | 77.574 | 0.512 | 0.362 | 1713.0 | 2577.0 | 1.0 |
| hsgp(X, m=10, c=1)_weights[7] | 94.574 | 20.745 | 53.167 | 132.182 | 0.423 | 0.299 | 2393.0 | 3105.0 | 1.0 |
| hsgp(X, m=10, c=1)_weights[8] | -1.494 | 17.184 | -31.835 | 33.063 | 0.361 | 0.255 | 2264.0 | 2853.0 | 1.0 |
| hsgp(X, m=10, c=1)_weights[9] | -45.442 | 15.955 | -76.649 | -16.636 | 0.289 | 0.208 | 3041.0 | 3183.0 | 1.0 |
But again we can sample from the posterior predictive distribution of the outcome variable
model_hsgp.predict(idata, data=new_data,
kind='pps', inplace=True)
idata-
<xarray.Dataset> Dimensions: (chain: 4, draw: 1000, hsgp(X, m=10, c=1)_weights_dim: 10, y_obs: 500) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 ... 996 997 998 999 * hsgp(X, m=10, c=1)_weights_dim (hsgp(X, m=10, c=1)_weights_dim) int64 0 ... * y_obs (y_obs) int64 0 1 2 3 4 ... 496 497 498 499 Data variables: hsgp(X, m=10, c=1)_weights_raw (chain, draw, hsgp(X, m=10, c=1)_weights_dim) float64 ... y_sigma (chain, draw) float64 20.86 19.86 ... 21.17 hsgp(X, m=10, c=1)_sigma (chain, draw) float64 26.85 24.0 ... 18.06 hsgp(X, m=10, c=1)_ell (chain, draw) float64 3.861 3.509 ... 3.76 hsgp(X, m=10, c=1)_weights (chain, draw, hsgp(X, m=10, c=1)_weights_dim) float64 ... y_mean (chain, draw, y_obs) float64 -0.1748 ... ... hsgp(X, m=10, c=1) (chain, draw, y_obs) float64 -0.1748 ... ... Attributes: created_at: 2024-04-07T23:03:11.964764 arviz_version: 0.17.0 modeling_interface: bambi modeling_interface_version: 0.13.0 -
<xarray.Dataset> Dimensions: (chain: 4, draw: 1000, y_obs: 500) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 992 993 994 995 996 997 998 999 * y_obs (y_obs) int64 0 1 2 3 4 5 6 7 8 ... 492 493 494 495 496 497 498 499 Data variables: y (chain, draw, y_obs) float64 -0.8078 24.75 20.08 ... 3.495 16.8 Attributes: modeling_interface: bambi modeling_interface_version: 0.13.0 -
<xarray.Dataset> Dimensions: (chain: 4, draw: 1000, y_obs: 133) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 992 993 994 995 996 997 998 999 * y_obs (y_obs) int64 0 1 2 3 4 5 6 7 8 ... 125 126 127 128 129 130 131 132 Data variables: y (chain, draw, y_obs) float64 -3.957 -3.957 -3.957 ... -4.013 -4.099 Attributes: created_at: 2024-04-07T23:03:11.968292 arviz_version: 0.17.0 modeling_interface: bambi modeling_interface_version: 0.13.0 -
<xarray.Dataset> Dimensions: (chain: 4, draw: 1000) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 ... 993 994 995 996 997 998 999 Data variables: acceptance_rate (chain, draw) float64 0.9484 0.989 0.9772 ... 0.9905 0.9283 step_size (chain, draw) float64 0.1631 0.1631 ... 0.1827 0.1827 diverging (chain, draw) bool False False False ... False False False energy (chain, draw) float64 671.4 665.4 671.0 ... 670.8 674.0 n_steps (chain, draw) int64 31 31 31 15 15 31 ... 15 31 31 15 15 31 tree_depth (chain, draw) int64 5 5 5 4 4 5 5 4 4 ... 4 5 5 4 5 5 4 4 5 lp (chain, draw) float64 661.7 661.3 665.7 ... 665.6 668.8 Attributes: created_at: 2024-04-07T23:03:11.967140 arviz_version: 0.17.0 modeling_interface: bambi modeling_interface_version: 0.13.0 -
<xarray.Dataset> Dimensions: (y_obs: 133) Coordinates: * y_obs (y_obs) int64 0 1 2 3 4 5 6 7 8 ... 125 126 127 128 129 130 131 132 Data variables: y (y_obs) float64 0.0 -1.3 -2.7 0.0 -2.7 ... -2.7 10.7 -2.7 10.7 Attributes: created_at: 2024-04-07T23:03:11.968567 arviz_version: 0.17.0 inference_library: numpyro inference_library_version: 0.13.2 sampling_time: 1.637607 modeling_interface: bambi modeling_interface_version: 0.13.0
and plot the model fit to see if it can recover the observed data.
ax = az.plot_hdi(new_data['X'], idata['posterior_predictive']['y'], fill_kwargs={'alpha': 0.2, 'color':'firebrick'}, figsize=(9, 8))
az.plot_hdi(new_data['X'], idata['posterior_predictive']['y'], fill_kwargs={'alpha': 0.8, 'color':'firebrick'}, hdi_prob=0.5)
y_mean = idata['posterior_predictive']['y'].mean(dim=('chain', 'draw'))
ax.plot(new_data['X'], y_mean, label='Expected posterior predictive', color='k')
ax.scatter(df['X'], df['y'], label='Observed Datapoints')
ax.legend()
ax.set_title("Posterior Predictive Distribution \n Based on HSGP approximation")Text(0.5, 1.0, 'Posterior Predictive Distribution \n Based on HSGP approximation')
And we can compare versus the spline models to see that by the aggregate performance measures our HSGP model seems to come out on top.
models_dict = {"piecewise_constant": idata_spline1, "piecewise_linear": idata_spline2, "cubic_bspline": idata_spline3, "cubic_bspline_10": idata_spline4,
"cubic_bspline_15": idata_spline5, 'hsgp': idata}
df_compare = az.compare(models_dict)
df_compare| rank | elpd_loo | p_loo | elpd_diff | weight | se | dse | warning | scale | |
|---|---|---|---|---|---|---|---|---|---|
| hsgp | 0 | -608.819770 | 9.958827 | 0.000000 | 0.873213 | 12.009698 | 0.000000 | False | log |
| cubic_bspline_10 | 1 | -612.278059 | 11.465651 | 3.458289 | 0.000000 | 9.688433 | 2.962396 | True | log |
| cubic_bspline_15 | 2 | -616.252712 | 15.566839 | 7.432943 | 0.000000 | 9.833120 | 3.457579 | True | log |
| cubic_bspline | 3 | -634.729956 | 8.698337 | 25.910186 | 0.000000 | 8.897139 | 7.686648 | True | log |
| piecewise_constant | 4 | -643.781042 | 6.981098 | 34.961272 | 0.069506 | 9.770740 | 9.490817 | False | log |
| piecewise_linear | 5 | -647.267410 | 6.170543 | 38.447641 | 0.057282 | 7.902853 | 9.200728 | False | log |
Recap
So far we’ve seen how we can use splines and gauassian processes to model highly eccentric functional relationships where the function could be approximated with univariate smoothing routine. Next we’ll show how to use hierarchical modelling over spline fits to extract insight into the data generating process over a family of curves. In particular we’ll focus on the development of insurance loss curves.
Insurance Loss Curves: Hierarchical Spline Models
We draw on car insurance losses data set discussed in Mick Cooney’s Stan case-study, but we simplify things for ourselves considerably by focusing on two types of loss and ensuring that each year under consideration has equal observations of the accruing losses.
loss_df = pd.read_csv('ppauto_pos.csv')
loss_df = loss_df[(loss_df['GRCODE'].isin([43, 353])) & (loss_df['DevelopmentYear'] < 1998)]
loss_df = loss_df[['GRCODE', 'AccidentYear', 'DevelopmentYear', 'DevelopmentLag', 'EarnedPremDIR_B', 'CumPaidLoss_B']]
loss_df.columns = ['grcode', 'acc_year', 'dev_year', 'dev_lag', 'premium', 'cum_loss']
loss_df['lr'] = loss_df['cum_loss'] / loss_df['premium']
loss_df = loss_df[(loss_df['acc_year'] <= 1992) & (loss_df['dev_lag'] <= 6)].reset_index(drop=True)
loss_df['year_code'] = loss_df['acc_year'].astype(str) + '_' + loss_df['grcode'].astype(str)
loss_df.sort_values(by=['year_code', 'acc_year', 'dev_lag'], inplace=True)
loss_df['standardised_premium'] = (loss_df.assign(mean_premium = np.mean(loss_df['premium']))
.assign(std_premium = np.std(loss_df['premium']))
.apply(lambda x: (x['mean_premium'] - x['premium']) /x['std_premium'], axis=1)
)
loss_df.head(12)| grcode | acc_year | dev_year | dev_lag | premium | cum_loss | lr | year_code | standardised_premium | |
|---|---|---|---|---|---|---|---|---|---|
| 30 | 353 | 1988 | 1988 | 1 | 18793 | 4339 | 0.230884 | 1988_353 | -0.347453 |
| 31 | 353 | 1988 | 1989 | 2 | 18793 | 9617 | 0.511733 | 1988_353 | -0.347453 |
| 32 | 353 | 1988 | 1990 | 3 | 18793 | 11584 | 0.616400 | 1988_353 | -0.347453 |
| 33 | 353 | 1988 | 1991 | 4 | 18793 | 12001 | 0.638589 | 1988_353 | -0.347453 |
| 34 | 353 | 1988 | 1992 | 5 | 18793 | 12640 | 0.672591 | 1988_353 | -0.347453 |
| 35 | 353 | 1988 | 1993 | 6 | 18793 | 12966 | 0.689938 | 1988_353 | -0.347453 |
| 0 | 43 | 1988 | 1988 | 1 | 957 | 133 | 0.138976 | 1988_43 | 1.722344 |
| 1 | 43 | 1988 | 1989 | 2 | 957 | 333 | 0.347962 | 1988_43 | 1.722344 |
| 2 | 43 | 1988 | 1990 | 3 | 957 | 431 | 0.450366 | 1988_43 | 1.722344 |
| 3 | 43 | 1988 | 1991 | 4 | 957 | 570 | 0.595611 | 1988_43 | 1.722344 |
| 4 | 43 | 1988 | 1992 | 5 | 957 | 615 | 0.642633 | 1988_43 | 1.722344 |
| 5 | 43 | 1988 | 1993 | 6 | 957 | 615 | 0.642633 | 1988_43 | 1.722344 |
Plot the Loss Curves
Here we have plotted the developing loss curves from two different coded insurance products.
pivot = loss_df.pivot(index=['dev_lag'], columns=['grcode', 'acc_year'], values='lr')
fig, axs = plt.subplots(1, 2, figsize=(9, 7))
pivot.plot(figsize=(10, 6), ax=axs[0])
axs[0].set_title("Loss Ratios by Year");
for c in pivot.columns:
if 43 in c:
color='red'
else:
color='grey'
axs[1].plot(pivot[c], color=color, label=c)
axs[1].legend()
axs[1].set_title("Loss Ratio by Group")Text(0.5, 1.0, 'Loss Ratio by Group')
We want to model these curves collectively as instances of draws from a distribution of loss curves. To do so we will specify a PyMC hierarchical (mixed) spline model. To do so we will have a spline basis for the global hyper parameters beta_g and the individual parameters for each curve. Here we define a convenience function to generate the basis splines.
from patsy import bs as bs_patsy, dmatrix
def make_basis_splines(num_knots=3, max_dev=7):
knot_list = np.linspace(0, max_dev, num_knots+2)[1:-1]
dev_periods = np.arange(1, max_dev, 1)
Bi = dmatrix(
"bs_patsy(dev_periods, knots=knots, degree=3, include_intercept=True) - 1",
{"dev_periods": dev_periods, "knots": knot_list},
)
Bg = dmatrix(
"bs_patsy(dev_periods, knots=knots, degree=3, include_intercept=True) - 1",
{"dev_periods": dev_periods, "knots": knot_list})
return Bi, Bg
Bi, Bg = make_basis_splines()
BgDesignMatrix with shape (6, 7)
Columns:
['bs_patsy(dev_periods, knots=knots, degree=3, include_intercept=True)[0]',
'bs_patsy(dev_periods, knots=knots, degree=3, include_intercept=True)[1]',
'bs_patsy(dev_periods, knots=knots, degree=3, include_intercept=True)[2]',
'bs_patsy(dev_periods, knots=knots, degree=3, include_intercept=True)[3]',
'bs_patsy(dev_periods, knots=knots, degree=3, include_intercept=True)[4]',
'bs_patsy(dev_periods, knots=knots, degree=3, include_intercept=True)[5]',
'bs_patsy(dev_periods, knots=knots, degree=3, include_intercept=True)[6]']
Terms:
'bs_patsy(dev_periods, knots=knots, degree=3, include_intercept=True)' (columns 0:7)
(to view full data, use np.asarray(this_obj))
Next we specify a model maker function to create the various pooled, unpooled and hierarhical (mixed) models of the insurance curve data. Note that even though we’re specifying a hierarhical model we have not specified a hierarchy over the insurance codes, instead we have added this as a “fixed” effect feature into our regression model. The idea is that this fixed effect will capture the differences in expected baseline per group code of insurance product.
def make_model(loss_df, num_knots=3, max_dev=7, model_type='mixed'):
Bi, Bg = make_basis_splines(num_knots, max_dev)
observed = loss_df['lr'].values
uniques, unique_codes = pd.factorize(loss_df['year_code'])
coords= {'years': unique_codes,
'splines': list(range(Bi.shape[1])) ,
'measurement': list(range(6)),
'obs': uniques
}
with pm.Model(coords=coords) as sp_insur:
basis_g = pm.MutableData('Bg', np.asfortranarray(Bg))
tau = pm.HalfCauchy('tau', 1)
## Global Hierarchical Spline Terms
beta_g = pm.Normal("beta_g", mu=0, sigma=tau,
dims='splines')
mu_g = pm.Deterministic("mu_g", pm.math.dot(basis_g, beta_g), dims='measurement')
## Individual or Year Specific Spline Modifications
if model_type in ['mixed', 'unpooled']:
basis_i = pm.MutableData('Bi', np.asfortranarray(Bi))
beta = pm.Normal("beta", mu=0, sigma=tau, dims=('splines', 'years'))
mui = pm.Deterministic("mui", pm.math.dot(basis_i, beta), dims=('measurement', 'years'))
## Features
prem = pm.MutableData('prem', loss_df['standardised_premium'].values)
grcode = pm.MutableData('grcode', loss_df['grcode'] == 43)
beta_prem = pm.Normal('beta_prem', 0, 1)
beta_grcode = pm.Normal('beta_grcode', 0, 1)
mu_prem = beta_prem*prem
mu_grcode = beta_grcode*grcode
## Likelihood
sigma = pm.TruncatedNormal("sigma", 1, lower=0.05)
if model_type == 'mixed':
mu = pm.Deterministic('mu', mu_grcode + mu_prem + (mu_g.T + mui.T).ravel(), dims='obs')
lr_likelihood = pm.Normal("lr", mu, sigma, observed=observed, dims=('obs'))
elif model_type == 'pooled':
lr_likelihood = pm.Normal("lr", np.repeat(mu_g, len(unique_codes)), sigma, observed=observed, dims='obs')
elif model_type == 'unpooled':
lr_likelihood = pm.Normal("lr", mui.T.ravel(), sigma, observed=observed, dims=('obs'))
## Sampling
idata_sp_insur = pm.sample(2000, return_inferencedata=True, target_accept=.99,
idata_kwargs={"log_likelihood": True})
idata_sp_insur = pm.sample_posterior_predictive(
idata_sp_insur,extend_inferencedata=True)
return idata_sp_insur, sp_insur
idata_sp_insur_unpooled, sp_insur_unpooled = make_model(loss_df, model_type='unpooled')
idata_sp_insur_pooled, sp_insur_pooled = make_model(loss_df, model_type='pooled')
idata_sp_insur_mixed, sp_insur_mixed = make_model(loss_df, model_type='mixed')The model structure can be seen more clearly in this graph
pm.model_to_graphviz(sp_insur_mixed)We can extract the effect of the differences grcodes and examine the baseline and annual spline related coefficients.
summary = az.summary(idata_sp_insur_mixed, var_names=['beta_grcode', 'beta_prem', 'beta_g', 'beta'])
summary| mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
|---|---|---|---|---|---|---|---|---|---|
| beta_grcode | 0.333 | 0.073 | 0.196 | 0.468 | 0.001 | 0.001 | 3681.0 | 4549.0 | 1.0 |
| beta_prem | -0.055 | 0.039 | -0.130 | 0.018 | 0.001 | 0.000 | 3991.0 | 4588.0 | 1.0 |
| beta_g[0] | 0.077 | 0.087 | -0.085 | 0.239 | 0.002 | 0.001 | 2867.0 | 3682.0 | 1.0 |
| beta_g[1] | 0.047 | 0.178 | -0.293 | 0.375 | 0.002 | 0.002 | 6887.0 | 5984.0 | 1.0 |
| beta_g[2] | 0.440 | 0.135 | 0.194 | 0.697 | 0.002 | 0.001 | 5224.0 | 5419.0 | 1.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| beta[6, 1990_43] | 0.550 | 0.108 | 0.353 | 0.758 | 0.002 | 0.001 | 3694.0 | 5325.0 | 1.0 |
| beta[6, 1991_353] | -0.011 | 0.103 | -0.216 | 0.173 | 0.002 | 0.001 | 3320.0 | 4653.0 | 1.0 |
| beta[6, 1991_43] | 0.106 | 0.106 | -0.098 | 0.301 | 0.002 | 0.001 | 3673.0 | 5075.0 | 1.0 |
| beta[6, 1992_353] | -0.027 | 0.102 | -0.215 | 0.168 | 0.002 | 0.001 | 3522.0 | 4359.0 | 1.0 |
| beta[6, 1992_43] | -0.066 | 0.128 | -0.307 | 0.178 | 0.002 | 0.002 | 3603.0 | 4731.0 | 1.0 |
79 rows × 9 columns
Again we can compare the performance metrics of the various models.
compare_df = az.compare({'unpooled': idata_sp_insur_unpooled,
'pooled': idata_sp_insur_pooled,
'mixed': idata_sp_insur_mixed})
az.plot_compare(compare_df)
compare_df| rank | elpd_loo | p_loo | elpd_diff | weight | se | dse | warning | scale | |
|---|---|---|---|---|---|---|---|---|---|
| mixed | 0 | 62.467712 | 36.162086 | 0.000000 | 1.000000e+00 | 2.543101 | 0.000000 | True | log |
| unpooled | 1 | 40.742353 | 46.418250 | 21.725359 | 2.394290e-11 | 1.474251 | 1.881133 | True | log |
| pooled | 2 | -9.217271 | 5.885869 | 71.684983 | 0.000000e+00 | 5.369316 | 4.178892 | False | log |
Plot the Posterior Predictive Checks
We can check how well the model can recapture the observed data.
def plot_ppc_splines(idata):
fig, axs = plt.subplots(5, 2, figsize=(9, 20), sharey=True)
axs = axs.flatten()
dev_periods = np.arange(1, 7, 1)
uniques, unique_codes = pd.factorize(loss_df['year_code'])
for y, c in zip(unique_codes, range(10)):
az.plot_hdi(dev_periods, idata['posterior_predictive']['lr'].sel(obs=c), color='firebrick', ax=axs[c], fill_kwargs={'alpha': 0.2}, hdi_prob=.89)
az.plot_hdi(dev_periods, idata['posterior_predictive']['lr'].sel(obs=c), color='firebrick', ax=axs[c], hdi_prob=0.5)
axs[c].scatter(dev_periods, loss_df[(loss_df['year_code'] == y)]['lr'], color='k', label='Actual Loss Ratio')
axs[c].plot(dev_periods, loss_df[(loss_df['year_code'] == y)]['lr'], color='k')
axs[c].set_title(f"PPC 89% and 50% HDI: {y}")
axs[c].set_ylabel("Loss Ratio")
axs[c].legend();
plot_ppc_splines(idata_sp_insur_mixed)Plot the Hierarchical Components
In the following plot we show similarly how to recapture the observed data, but additionally we decompose the structure of the model in each case and extract baseline forecasts which would be our guide to future loss-ratio development curves in lieu of any other information.
mu_g = idata_sp_insur_mixed.posterior.stack(draws=("chain", "draw"))["mu_g"]
mu_i = idata_sp_insur_mixed.posterior.stack(draws=("chain", "draw"))["mui"]
mu = idata_sp_insur_mixed.posterior.stack(draws=("chain", "draw"))['mu']
beta_grcode = idata_sp_insur_mixed.posterior.stack(draws=("chain", "draw"))['beta_grcode']
dev_periods = np.arange(1, 7, 1)
uniques, unique_codes = pd.factorize(loss_df['year_code'])
mosaic = """
AB
CD
EF
GH
IJ
KK
"""
fig, axs = plt.subplot_mosaic(mosaic, sharey=True,
figsize=(9, 20))
axs = [axs[k] for k in axs.keys()]
mu_g_mean = mu_g.mean(dim='draws')
for y, c in zip(unique_codes, range(10)):
group_effect = 0
if '43' in y:
group_effect = beta_grcode.mean().item()
mu_i_mean = mu_i.sel(years=y).mean(dim='draws')
axs[c].plot(dev_periods, group_effect + mu_g_mean.values + mu_i_mean.values, label='Combined + E(grp_effect)', color='purple', linewidth=3.5)
axs[c].plot(dev_periods, group_effect + mu_g_mean.values, label='E(Hierarchical Baseline)', color='red', linestyle='--')
axs[c].plot(dev_periods, group_effect + mu_i_mean.values, label='E(Year Specific Adjustment term)', color='blue', linestyle='--')
axs[c].scatter(dev_periods, loss_df[(loss_df['year_code'] == y)]['lr'], color='k', label='Actual Loss Ratio')
az.plot_hdi(dev_periods,mu.sel(obs=c).T , ax=axs[c], color='firebrick', fill_kwargs={'alpha': 0.2})
az.plot_hdi(dev_periods, mu.sel(obs=c).T , ax=axs[c], color='firebrick', fill_kwargs={'alpha': 0.5}, hdi_prob=.50)
axs[c].set_title(f"Components for Year {y}")
axs[c].set_ylabel("Loss Ratio")
if (c == 0) or (c == 1):
axs[c].legend()
axs[10].plot(dev_periods, mu_g_mean.values, label='E(Hierarchical Baseline)', color='black')
axs[10].plot(dev_periods, mu_g_mean.values + group_effect, label='E(Hierarchical Baseline) + E(grp_effect)', color='black', linestyle='--')
az.plot_hdi(dev_periods, mu_g.T.values, color='slateblue', ax=axs[10], fill_kwargs={'alpha': 0.2})
az.plot_hdi(dev_periods, mu_g.T.values + group_effect, color='magenta', ax=axs[10], fill_kwargs={'alpha': 0.2})
az.plot_hdi(dev_periods, mu_g.T.values, color='slateblue', ax=axs[10], hdi_prob=.5)
az.plot_hdi(dev_periods, mu_g.T.values + group_effect, color='magenta', ax=axs[10], hdi_prob=.5)
axs[10].set_title("Baseline Forecast Loss Ratio")
axs[10].legend();This plot is a bit clunky, because we’re mixing expectations and posterior distributions over the parameters. The point is just to highlight the “compositional” structure of our model. A better way to interrogate the implications of the model is to “push” forward different data through the posterior predictive distribution and derive a kind of ceteris paribus rule for accrual of losses.
with sp_insur_mixed:
pm.set_data({'grcode': np.ones(len(loss_df)),
})
idata_43 = pm.sample_posterior_predictive(idata_sp_insur_mixed, var_names=['lr'], extend_inferencedata =True)
with sp_insur_mixed:
pm.set_data({'grcode': np.zeros(len(loss_df))})
idata_353 = pm.sample_posterior_predictive(idata_sp_insur_mixed, var_names=['lr'], extend_inferencedata=True)
idata_353Sampling: [lr]
Sampling: [lr]
-
<xarray.Dataset> Dimensions: (chain: 4, draw: 2000, splines: 7, years: 10, measurement: 6, obs: 60) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 ... 1994 1995 1996 1997 1998 1999 * splines (splines) int64 0 1 2 3 4 5 6 * years (years) <U8 '1988_353' '1988_43' ... '1992_353' '1992_43' * measurement (measurement) int64 0 1 2 3 4 5 * obs (obs) int64 0 0 0 0 0 0 1 1 1 1 1 1 ... 8 8 8 8 8 8 9 9 9 9 9 9 Data variables: beta_g (chain, draw, splines) float64 -0.01035 -0.116 ... 0.6097 beta (chain, draw, splines, years) float64 0.117 ... -0.007498 beta_prem (chain, draw) float64 -0.1393 -0.1123 ... -0.006277 -0.03971 beta_grcode (chain, draw) float64 0.4144 0.3885 0.369 ... 0.3 0.348 0.3338 tau (chain, draw) float64 0.2756 0.2812 0.2426 ... 0.2408 0.2445 sigma (chain, draw) float64 0.0613 0.06007 ... 0.07141 0.05469 mu_g (chain, draw, measurement) float64 -0.01035 0.3518 ... 0.6097 mui (chain, draw, measurement, years) float64 0.117 ... -0.007498 mu (chain, draw, obs) float64 0.155 0.3726 ... 0.9142 0.9984 Attributes: created_at: 2024-04-07T23:04:54.165388 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3 sampling_time: 40.95733094215393 tuning_steps: 1000 -
<xarray.Dataset> Dimensions: (chain: 4, draw: 2000, obs: 60) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 7 ... 1993 1994 1995 1996 1997 1998 1999 * obs (obs) int64 0 0 0 0 0 0 1 1 1 1 1 1 2 ... 7 8 8 8 8 8 8 9 9 9 9 9 9 Data variables: lr (chain, draw, obs) float64 0.2182 0.3866 0.5316 ... 0.8773 1.057 Attributes: created_at: 2024-04-07T23:04:56.246066 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3 -
<xarray.Dataset> Dimensions: (chain: 4, draw: 2000, obs: 60) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 7 ... 1993 1994 1995 1996 1997 1998 1999 * obs (obs) int64 0 0 0 0 0 0 1 1 1 1 1 1 2 ... 7 8 8 8 8 8 8 9 9 9 9 9 9 Data variables: lr (chain, draw, obs) float64 1.108 -0.7015 1.519 ... 1.969 1.072 Attributes: created_at: 2024-04-07T23:04:54.421910 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3 -
<xarray.Dataset> Dimensions: (chain: 4, draw: 2000) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999 Data variables: (12/17) perf_counter_diff (chain, draw) float64 0.009466 0.009488 ... 0.009225 process_time_diff (chain, draw) float64 0.009467 0.009488 ... 0.009225 reached_max_treedepth (chain, draw) bool False False False ... False False largest_eigval (chain, draw) float64 nan nan nan nan ... nan nan nan energy_error (chain, draw) float64 0.01364 -0.01196 ... 0.004259 lp (chain, draw) float64 64.49 70.66 ... 68.69 74.42 ... ... step_size_bar (chain, draw) float64 0.03879 0.03879 ... 0.03909 tree_depth (chain, draw) int64 7 7 7 7 7 8 8 7 ... 7 8 8 7 7 7 7 max_energy_error (chain, draw) float64 -0.03853 -0.04207 ... -0.05341 step_size (chain, draw) float64 0.03762 0.03762 ... 0.03715 perf_counter_start (chain, draw) float64 1.572e+06 ... 1.572e+06 energy (chain, draw) float64 -26.3 -27.7 ... -22.29 -35.84 Attributes: created_at: 2024-04-07T23:04:54.176614 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3 sampling_time: 40.95733094215393 tuning_steps: 1000 -
<xarray.Dataset> Dimensions: (obs: 60) Coordinates: * obs (obs) int64 0 0 0 0 0 0 1 1 1 1 1 1 2 ... 7 8 8 8 8 8 8 9 9 9 9 9 9 Data variables: lr (obs) float64 0.2309 0.5117 0.6164 0.6386 ... 0.859 0.9038 0.9244 Attributes: created_at: 2024-04-07T23:04:54.179104 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3 -
<xarray.Dataset> Dimensions: (Bg_dim_0: 6, Bg_dim_1: 7, Bi_dim_0: 6, Bi_dim_1: 7, prem_dim_0: 60, grcode_dim_0: 60) Coordinates: * Bg_dim_0 (Bg_dim_0) int64 0 1 2 3 4 5 * Bg_dim_1 (Bg_dim_1) int64 0 1 2 3 4 5 6 * Bi_dim_0 (Bi_dim_0) int64 0 1 2 3 4 5 * Bi_dim_1 (Bi_dim_1) int64 0 1 2 3 4 5 6 * prem_dim_0 (prem_dim_0) int64 0 1 2 3 4 5 6 7 ... 52 53 54 55 56 57 58 59 * grcode_dim_0 (grcode_dim_0) int64 0 1 2 3 4 5 6 7 ... 53 54 55 56 57 58 59 Data variables: Bg (Bg_dim_0, Bg_dim_1) float64 1.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 Bi (Bi_dim_0, Bi_dim_1) float64 1.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 prem (prem_dim_0) float64 -0.3475 -0.3475 -0.3475 ... -1.572 -1.572 grcode (grcode_dim_0) float64 0.0 0.0 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0 Attributes: created_at: 2024-04-07T23:04:54.179985 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3
Even here though we want to average the curves over the specific years in the data and abstract a view of the model implications under different counterfactual settings. Or put another way we a view of the average development curve between years not within a year. Here we define a helper function to effect this step.
def get_posterior_predictive_curve(idata, prem=2, grcode=1):
weighted_splines = [np.dot(np.asfortranarray(Bi), az.extract(idata['posterior']['beta'])['beta'].values[:, :, i]) for i in range(1000)]
weighted_splines_1 = [np.dot(np.asfortranarray(Bg), az.extract(idata['posterior']['beta_g'])['beta_g'].values[:, i]) for i in range(1000)]
beta_grcode = az.extract(idata['posterior']['beta_grcode'])['beta_grcode']
beta_prem = az.extract(idata['posterior']['beta_prem'])['beta_prem']
df1 = pd.DataFrame([beta_prem.values[i]*prem + beta_grcode.values[i]*grcode for i in range(1000)]).T
## Crucial step averaging over the years to get
## a view of the development period
df = pd.concat([pd.DataFrame(weighted_splines_1[i].T + weighted_splines[i].T).mean() for i in range(1000)], axis=1)
df = df1.iloc[0] + df
return df
pred_df_1 = get_posterior_predictive_curve(idata_43, prem=1, grcode=1)
pred_df_0 = get_posterior_predictive_curve(idata_353, prem=1, grcode=0)
fig, ax = plt.subplots(figsize=(9, 7), sharey=True)
ax.plot(pred_df_1, color='firebrick', alpha=0.1)
ax.plot(pred_df_0, color='slateblue', alpha=0.2);
ax.plot(pred_df_0.mean(axis=1), linestyle='-', color='k', linewidth=4, label='grcode 353 prem 1')
ax.plot(pred_df_1.mean(axis=1), linestyle='--', color='grey', linewidth=4, label='grcode 43 prem 1')
ax.set_title("Posterior Samples of the Trajectories \n Under different Counterfactual settings")
ax.set_ylabel("Loss Ratio")
ax.set_xlabel("Development Period")
ax.legend();Multiple Smoother Regression Models
To demonstrate how spline modelling can be further adapting to the multiple regression like cases we use the PISA data set discussed in this case study from Michael Clark.
The data set has been constructed using average Science scores by country from the Programme for International Student Assessment (PISA) 2006, along with GNI per capita (Purchasing Power Parity, 2005 dollars), Educational Index, Health Index, and Human Development Index from UN data. We want to model the overall outcome score as a function of these broad demographic features.
pisa_df = pd.read_csv("https://raw.githubusercontent.com/m-clark/generalized-additive-models/master/data/pisasci2006.csv")
pisa_df| Country | Overall | Issues | Explain | Evidence | Interest | Support | Income | Health | Edu | HDI | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Albania | NaN | NaN | NaN | NaN | NaN | NaN | 0.599 | 0.886 | 0.716 | 0.724 |
| 1 | Argentina | 391.0 | 395.0 | 386.0 | 385.0 | 567.0 | 506.0 | 0.678 | 0.868 | 0.786 | 0.773 |
| 2 | Australia | 527.0 | 535.0 | 520.0 | 531.0 | 465.0 | 487.0 | 0.826 | 0.965 | 0.978 | 0.920 |
| 3 | Austria | 511.0 | 505.0 | 516.0 | 505.0 | 507.0 | 515.0 | 0.835 | 0.944 | 0.824 | 0.866 |
| 4 | Azerbaijan | 382.0 | 353.0 | 412.0 | 344.0 | 612.0 | 542.0 | 0.566 | 0.780 | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 60 | Turkey | 424.0 | 427.0 | 423.0 | 417.0 | 540.0 | 563.0 | 0.679 | 0.828 | 0.562 | 0.681 |
| 61 | United Arab Emirates | NaN | NaN | NaN | NaN | NaN | NaN | 0.909 | 0.878 | 0.686 | 0.818 |
| 62 | United Kingdom | 515.0 | 514.0 | 517.0 | 514.0 | 464.0 | 470.0 | 0.833 | 0.934 | 0.798 | 0.853 |
| 63 | United States | 489.0 | 492.0 | 486.0 | 489.0 | 480.0 | 490.0 | 0.872 | 0.911 | 0.930 | 0.904 |
| 64 | Uruguay | 428.0 | 429.0 | 423.0 | 429.0 | 567.0 | 510.0 | 0.658 | 0.884 | 0.740 | 0.755 |
65 rows × 11 columns
The relationships displayed between each of these measures is not obviously linear, and as such could plausibly benefit from being modelled with splines.
g = sns.pairplot(pisa_df[['Overall', 'Income', 'Support', 'Health', 'Edu']], kind="reg")
g.fig.suptitle("Pair Plot of Complex Relations", y=1.05);We define two basic models for contrasting. Note here how we have to define a seperate spline basis for each of the covariates.
formula = "Overall ~ Income + Edu + Health"
base_model = bmb.Model(formula, pisa_df, dropna=True)
knots_income = np.linspace(np.min(pisa_df['Income']), np.max(pisa_df['Income']), 5+2)[1:-1]
knots_edu = np.linspace(np.min(pisa_df['Edu']), np.max(pisa_df['Edu']), 5+2)[1:-1]
knots_health = np.linspace(np.min(pisa_df['Health']), np.max(pisa_df['Health']), 5+2)[1:-1]
formula_spline = """Overall ~ bs(Income, degree=3, knots=knots_income) + bs(Edu, degree=3, knots=knots_edu) + bs(Health, degree=3, knots=knots_health)"""
spline_model = bmb.Model(formula_spline, pisa_df, dropna=True)
spline_modelAutomatically removing 13/65 rows from the dataset.
Automatically removing 13/65 rows from the dataset.
Formula: Overall ~ bs(Income, degree=3, knots=knots_income) + bs(Edu, degree=3, knots=knots_edu) + bs(Health, degree=3, knots=knots_health)
Family: gaussian
Link: mu = identity
Observations: 52
Priors:
target = mu
Common-level effects
Intercept ~ Normal(mu: 471.1538, sigma: 464.5641)
bs(Income, degree=3, knots=knots_income) ~ Normal(mu: [0. 0. 0. 0. 0. 0. 0. 0.], sigma:
[2700.1849 1474.917 742.9386 517.2898 651.4592 515.7397 867.8945
876.0978])
bs(Edu, degree=3, knots=knots_edu) ~ Normal(mu: [0. 0. 0. 0. 0. 0. 0. 0.], sigma: [1120.7928
920.8213 702.8955 537.103 560.453 582.8883 1023.4867
680.9089])
bs(Health, degree=3, knots=knots_health) ~ Normal(mu: [0. 0. 0. 0. 0. 0. 0. 0.], sigma:
[1465.0684 3939.0308 690.5161 531.6981 655.7589 552.7209 642.9461
925.9747])
Auxiliary parameters
sigma ~ HalfStudentT(nu: 4.0, sigma: 53.4654)
spline_model.build()
spline_model.graph()base_idata = base_model.fit(random_seed=random_seed, idata_kwargs={"log_likelihood": True})
spline_idata = spline_model.fit(random_seed=random_seed, idata_kwargs={"log_likelihood": True}, target_accept=.95)Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [Overall_sigma, Intercept, Income, Edu, Health]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 2 seconds.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [Overall_sigma, Intercept, bs(Income, degree=3, knots=knots_income), bs(Edu, degree=3, knots=knots_edu), bs(Health, degree=3, knots=knots_health)]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 27 seconds.
We can compare the simple regression approach to the spline based regression in the now usual way.
compare_df = az.compare({'spline': spline_idata, 'raw': base_idata})
compare_df| rank | elpd_loo | p_loo | elpd_diff | weight | se | dse | warning | scale | |
|---|---|---|---|---|---|---|---|---|---|
| spline | 0 | -254.101306 | 20.218465 | 0.000000 | 0.608049 | 4.724572 | 0.000000 | True | log |
| raw | 1 | -263.061455 | 9.617888 | 8.960149 | 0.391951 | 10.083921 | 9.924089 | True | log |
The coefficients comparisons are harder
az.summary(base_idata)| mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
|---|---|---|---|---|---|---|---|---|---|
| Intercept | 118.438 | 81.996 | -41.166 | 266.033 | 1.522 | 1.086 | 2915.0 | 2986.0 | 1.0 |
| Income | 181.955 | 89.547 | 17.413 | 353.598 | 1.913 | 1.353 | 2196.0 | 2432.0 | 1.0 |
| Edu | 234.125 | 59.153 | 120.739 | 340.343 | 1.017 | 0.740 | 3414.0 | 2518.0 | 1.0 |
| Health | 30.395 | 140.966 | -223.699 | 298.258 | 2.852 | 2.153 | 2450.0 | 2598.0 | 1.0 |
| Overall_sigma | 34.234 | 3.620 | 27.882 | 41.175 | 0.060 | 0.043 | 3757.0 | 2879.0 | 1.0 |
since it’s less clear what the spline coefficient terms mean.
az.summary(spline_idata)| mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
|---|---|---|---|---|---|---|---|---|---|
| Intercept | 225.689 | 205.515 | -133.222 | 630.021 | 6.036 | 4.292 | 1163.0 | 1653.0 | 1.0 |
| bs(Income, degree=3, knots=knots_income)[0] | -355.159 | 1613.343 | -3460.788 | 2642.535 | 56.357 | 39.865 | 819.0 | 1504.0 | 1.0 |
| bs(Income, degree=3, knots=knots_income)[1] | 194.905 | 122.719 | -42.478 | 423.960 | 2.343 | 1.728 | 2748.0 | 2719.0 | 1.0 |
| bs(Income, degree=3, knots=knots_income)[2] | 48.748 | 123.638 | -182.012 | 272.863 | 4.065 | 2.875 | 923.0 | 1431.0 | 1.0 |
| bs(Income, degree=3, knots=knots_income)[3] | 68.533 | 98.242 | -118.858 | 250.496 | 3.369 | 2.383 | 853.0 | 1416.0 | 1.0 |
| bs(Income, degree=3, knots=knots_income)[4] | 210.290 | 103.892 | 17.347 | 406.753 | 3.626 | 2.565 | 825.0 | 1404.0 | 1.0 |
| bs(Income, degree=3, knots=knots_income)[5] | 222.546 | 105.553 | 31.163 | 427.878 | 3.589 | 2.539 | 866.0 | 1442.0 | 1.0 |
| bs(Income, degree=3, knots=knots_income)[6] | 202.395 | 105.402 | 0.453 | 394.110 | 3.629 | 2.567 | 846.0 | 1507.0 | 1.0 |
| bs(Income, degree=3, knots=knots_income)[7] | 81.592 | 103.313 | -106.154 | 279.937 | 3.641 | 2.575 | 805.0 | 1390.0 | 1.0 |
| bs(Edu, degree=3, knots=knots_edu)[0] | 145.558 | 287.518 | -410.399 | 654.146 | 8.525 | 6.029 | 1142.0 | 1509.0 | 1.0 |
| bs(Edu, degree=3, knots=knots_edu)[1] | 47.574 | 193.029 | -323.504 | 392.406 | 5.595 | 3.957 | 1197.0 | 1656.0 | 1.0 |
| bs(Edu, degree=3, knots=knots_edu)[2] | 71.806 | 211.848 | -312.889 | 469.050 | 6.246 | 4.417 | 1156.0 | 1647.0 | 1.0 |
| bs(Edu, degree=3, knots=knots_edu)[3] | 147.390 | 200.622 | -224.182 | 518.214 | 5.972 | 4.224 | 1134.0 | 1603.0 | 1.0 |
| bs(Edu, degree=3, knots=knots_edu)[4] | 70.472 | 203.119 | -314.815 | 427.662 | 6.018 | 4.256 | 1144.0 | 1535.0 | 1.0 |
| bs(Edu, degree=3, knots=knots_edu)[5] | 121.297 | 202.750 | -264.164 | 485.547 | 5.998 | 4.243 | 1147.0 | 1575.0 | 1.0 |
| bs(Edu, degree=3, knots=knots_edu)[6] | 78.372 | 206.089 | -296.582 | 457.669 | 6.012 | 4.252 | 1178.0 | 1713.0 | 1.0 |
| bs(Edu, degree=3, knots=knots_edu)[7] | 124.157 | 203.070 | -258.062 | 490.727 | 6.008 | 4.249 | 1148.0 | 1532.0 | 1.0 |
| bs(Health, degree=3, knots=knots_health)[0] | 235.831 | 885.656 | -1490.479 | 1862.011 | 31.104 | 22.002 | 813.0 | 1482.0 | 1.0 |
| bs(Health, degree=3, knots=knots_health)[1] | 162.805 | 389.636 | -552.198 | 913.839 | 7.756 | 6.545 | 2529.0 | 2422.0 | 1.0 |
| bs(Health, degree=3, knots=knots_health)[2] | 14.967 | 120.574 | -202.130 | 248.327 | 3.854 | 2.726 | 979.0 | 1657.0 | 1.0 |
| bs(Health, degree=3, knots=knots_health)[3] | 25.213 | 100.498 | -170.867 | 204.686 | 3.411 | 2.413 | 871.0 | 1555.0 | 1.0 |
| bs(Health, degree=3, knots=knots_health)[4] | -21.315 | 102.473 | -223.005 | 158.665 | 3.577 | 2.530 | 822.0 | 1468.0 | 1.0 |
| bs(Health, degree=3, knots=knots_health)[5] | -19.449 | 101.418 | -213.704 | 165.809 | 3.541 | 2.505 | 823.0 | 1444.0 | 1.0 |
| bs(Health, degree=3, knots=knots_health)[6] | -55.820 | 100.895 | -252.216 | 126.193 | 3.547 | 2.509 | 811.0 | 1347.0 | 1.0 |
| bs(Health, degree=3, knots=knots_health)[7] | 0.840 | 101.892 | -196.002 | 186.216 | 3.543 | 2.506 | 828.0 | 1469.0 | 1.0 |
| Overall_sigma | 24.766 | 3.303 | 18.816 | 30.884 | 0.085 | 0.060 | 1491.0 | 2068.0 | 1.0 |
We can also check that the model seems to recover the observed data well.
### Plotting Posterior Predictive Checks
base_model.predict(base_idata, kind='pps')
spline_model.predict(spline_idata, kind='pps')
fig, axs = plt.subplots(2, 1, figsize=(8, 12))
axs = axs.flatten()
az.plot_ppc(base_idata, ax=axs[0])
az.plot_ppc(spline_idata, ax=axs[1]);
axs[0].set_xlabel('')
axs[1].set_title("PPC: Spline Model");
axs[0].set_title("PPC: Regression Model");Next we’ll dig into the spline basis features and decompose the predicted outcome and show how the outcome varies for levels in the inputs variables. The Bambi interpret package offers some functionality to assess the conditional predictions for varying values of the input variables.
Model Interpretability Plots in Bambi
fig, axs = plt.subplots(3, 1, figsize=(9, 10))
axs = axs.flatten()
bmb.interpret.plot_predictions(
spline_model, spline_idata, "Income", ax=axs[0]
)
bmb.interpret.plot_predictions(
spline_model, spline_idata, "Edu", ax=axs[1]
)
bmb.interpret.plot_predictions(
spline_model, spline_idata, "Health", ax=axs[2]
);
axs[0].set_title("Predictions over the Range of Observed Covariate Values");We can pull these types of values out into a table. Note here how we ask for predictions based on varying values of the Edu and Income variables where we keep the Health variable fixed at the mean value.
summary_df = bmb.interpret.predictions(
spline_model,
spline_idata,
conditional={
"Edu": list(np.linspace(0.5, 1, 10)),
"Income": list(np.linspace(0.4, 1, 10)),
},
)
summary_df| Edu | Income | Health | estimate | lower_3.0% | upper_97.0% | |
|---|---|---|---|---|---|---|
| 0 | 0.5 | 0.400000 | 0.885672 | -29.666186 | -1954.336608 | 2094.868474 |
| 1 | 0.5 | 0.466667 | 0.885672 | -228.798806 | -2085.499823 | 1522.448806 |
| 2 | 0.5 | 0.533333 | 0.885672 | -21.580853 | -1613.700514 | 1468.840651 |
| 3 | 0.5 | 0.600000 | 0.885672 | -57.471006 | -1597.034080 | 1581.365824 |
| 4 | 0.5 | 0.666667 | 0.885672 | -42.923777 | -1539.633756 | 1595.098933 |
| ... | ... | ... | ... | ... | ... | ... |
| 95 | 1.0 | 0.733333 | 0.885672 | 514.862194 | 438.494673 | 587.329605 |
| 96 | 1.0 | 0.800000 | 0.885672 | 568.402239 | 474.848846 | 663.243239 |
| 97 | 1.0 | 0.866667 | 0.885672 | 566.889704 | 479.214581 | 657.933780 |
| 98 | 1.0 | 0.933333 | 0.885672 | 289.886165 | 85.237681 | 495.492796 |
| 99 | 1.0 | 1.000000 | 0.885672 | -1001.019928 | -2494.557415 | 614.522981 |
100 rows × 6 columns
We can then plot these results based on the conditional realisations and see some interesting behaviour at the implausilble reaslisations. This should make us somewhat wary of using this model to extrapolate too far beyond the observable range of data
g = sns.relplot(data=summary_df, x="Income", y="estimate", hue="Edu")
g.fig.suptitle("Marginal Predictions of the Outcome Variable \n conditional on counterfactual values for Edu and Income", y=1.05);The Spline Component Contributions
Finally, we’ll pull out the component contributions of each variable and see how they combine additively. This will also serve as a kind of posterior predictive check for each country as we can show the degree which posterior draws from each component sum to achieve a plausible mirror of the observed data.
Bincome = spline_model.response_component.design.common['bs(Income, degree=3, knots=knots_income)']
income_coefs = az.extract(spline_idata['posterior']['bs(Income, degree=3, knots=knots_income)'])['bs(Income, degree=3, knots=knots_income)']
Bedu = spline_model.response_component.design.common['bs(Edu, degree=3, knots=knots_edu)']
edu_coefs = az.extract(spline_idata['posterior']['bs(Edu, degree=3, knots=knots_edu)'])['bs(Edu, degree=3, knots=knots_edu)']
Bhealth = spline_model.response_component.design.common['bs(Health, degree=3, knots=knots_health)']
health_coefs = az.extract(spline_idata['posterior']['bs(Health, degree=3, knots=knots_health)'])['bs(Health, degree=3, knots=knots_health)']
income = np.dot(Bincome, income_coefs).T
edu = np.dot(Bedu, edu_coefs).T
health = np.dot(Bhealth, health_coefs).T
intercept = az.extract(spline_idata['posterior']['Intercept'])['Intercept'].values
fig, ax = plt.subplots(figsize=(10, 7))
for i in range(100):
if i == 1:
ax.plot(income[i], label='Income Component', color='red')
ax.plot(edu[i], label='Edu Component', color='blue')
ax.plot(health[i], label='Health Component', color='darkgreen')
ax.plot(intercept[i] + income[i] + edu[i] + health[i], label='Combined Components', color='purple')
else:
ax.plot(income[i], alpha=0.1, color='red')
ax.plot(edu[i], alpha=0.1, color='blue')
ax.plot(health[i], alpha=0.1, color='darkgreen')
ax.plot(intercept[i] + income[i] + edu[i] + health[i], color='purple', alpha=0.3)
ax.scatter(range(len(spline_idata['observed_data']['Overall'])), spline_idata['observed_data']['Overall'], label='Observed', color='grey', s=56, ec='black')
ax.set_title("Additive Spline Components", fontsize=20)
ax.legend();
ax.set_xticklabels(pisa_df.dropna(axis=0).reset_index()['Country']);We can see here how the combined components borrow the structure of the outcome variable primarily from the income variable component. The health measures have closer to zero additive contribution while the uncertainty in the educational component varies wildly. But the blue educational component here is used primarily as a scaling contribution which adds to the level of the outcome variable rather than distorting the shape.
Conclusion
We’ve seen the application of splines as univariate smoothers to approximate wiggly curves of arbitrary shape. We’ve also tried gaussian process approximations of the same univariate functions. The suggested flexibility of both methods is a strength, but we need to be careful where splines have a tendency to over-fit to individual curves. As such we have tried to show that we can incorporate spline basis modelling in a hierarchical bayesian model and recover more compelling posterior predictive checks and additionally derive predictiions from the mixed variant of the hierarhical model which helps us understand the implications of the data for generic forecasts of insurance loss curves. Finally we showed how splines can be used additively to model non-linear functions of multiple covariates. These are a powerful tool to interrogate complex non-linear relationships, but they offer interpolation of functions within a well understood domain. Applealing to these models for extrapolation needs to be done carefully.
Citation
@online{forde2024,
author = {Nathaniel Forde},
title = {GAMs and {GPs:} {Flexibility} and {Calibration}},
date = {2024-04-07},
langid = {en}
}